Missing-Values Adjustment for Mixed-Type Data
نویسندگان
چکیده
We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation NNHDI where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record.
منابع مشابه
Performance evaluation of different estimation methods for missing rainfall data
There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...
متن کاملA MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES
A model for missing data in mixed binary and continuous responses, which can be used on cross-sectional data, is presented. In this model response indicator for the binary response can be dependent on the continuous response. A closed form for the likelihood is found. For data with a complicated pattern of missing responses some new residuals are also proposed. The model of multiplicative heter...
متن کاملDynamic Clustering-Based Estimation of Missing Values in Mixed Type Data
The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The proposed dynamic clustering imputation (DCI) algorithm relies on similarity information from shared neighbors, where mixed type variables are considered together. When evaluated on a public social science dataset of 46,04...
متن کاملA Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values
Missing values and dropouts are common issues in longitudinal studies in all areas of medicine and public health. Intent-to-treat (ITT) analysis has become a widely accepted method for the analysis of controlled clinical trials. In most controlled clinical trials, some patients do not complete their intended followup according to the protocol for a variety of reasons; this problem generates mis...
متن کاملCERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates
We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to ...
متن کامل